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We consider different ways to control the magnification in self-organizing 
maps (SOM) and neural gas (NG). Starting from early approaches of 
magnification control in vector quantization, we then concentrate on 
different approaches for SOM and NG. We show that three structurally 
similar approaches can be applied to both algorithms: localized learning, 
concave-convex learning, and winner relaxing learning. Thereby, the 
approach of concave-convex learning in SOM is extended to a more general 
description, whereas the concave-convex learning for NG is new. In general, 
the control mechanisms generate only slightly different behavior comparing 
both neural algorithms. However, we emphasize that the NG results are 
valid for any data dimension, whereas in the SOM case the results hold only 
for the one-dimensional case. 

1 Introduction 

Vector quantization is an important task in data processing, pattern recog- 
nition and control (Fritzke, 1993; Haykin, 1994; Linde, Buzo, & Gray, 
1980; Ripley, 1996). A large number of different types have been dis- 
cussed, (for an overview, refer to Haykin, 1994; Kohonen, 1995; Duda 
& Hart, 1973). Neural maps are a popular type of neural vector quan- 
tizers that are commonly used in, for example, data visualization, feature 
extraction, principle component analysis, image processing, classification 
tasks, and acceleration of common vector quantization (de Bodt, Cot- 
trell, Letremy, & Verleysen, 2004). Well known approaches are the Self- 
Organizing Map (SOM) (Kohonen, 1995), the neural gas (NG) (Martinetz, 
Berkovich, & Schulten, 1993), elastic net (EN) (Durbin & Willshaw, 
1987) and generative topographic mapping (GTM) (Bishop, Svensen, & 
Williams 1998). 

In vector quantization, data vectors v G M d are represented by a few 
codebooks or weight vectors w i? where i is an arbitrary index. Several cri- 
teria exist to evaluate the quality of a vector quantizer. The most common 



1 



one is the squared reconstruction error. However, other quality criteria are 
also known, for instance topographic quality for neighborhood preserving 
mapping approaches (Bauer & Pawelzik, 1992; Bauer, Der, & Villmann, 
1999), optimization of mutual information (Linsker, 1989) and other cri- 
teria (for an overview, see Hay kin, 1994). Generally, a faithful represen- 
tation of the data space by the codebooks is desired. This property is 
closely related to the so-called magnification, which describes the relation 
between data and weight vector density for a given model. The knowledge 
of magnification of a map is essential for correct interpretation of its output 
(Hammer & Villmann, 2003). In addition, explicit magnification control is 
a desirable property of learning algorithms, if depending on the respective 
application, only sparsely covered regions of the data space have to be em- 
phasized or, conversely, suppressed. The magnification can be explicitly 
expressed for several vector quantization models. Usually, for these ap- 
proaches the magnification can be expressed by a power law between the 
codebook vector density p and the data density P. The respective exponent 
is called magnification exponent or magnification factor. As explained in 
more detail below, the magnification is also related to other properties of 
the map, for example, reconstruction error as well as mutual information. 
Hence, control of magnification is influencing these properties too. 

In biologically motivated approaches, magnification can also be seen 
in the context of information representation in brains, for instance, in the 
senso-motoric cortex (Ritter, Martinetz, & Schulten, 1992). Magnification 
and its control can be related to biological phenomena like the perceptual 
magnet effect, which refers to the fact that rarely occurring stimuli are dif- 
ferentiated with high precision whereas frequent stimuli are distinguished 
only in a rough manner (Kuhl, 1991; Kuhl, Williams, Lacerda, Stevens, 
& Lindblom, 1992). It is a kind of attention-based learning with inverted 
magnification, that is, rarely occurring input samples are emphasized by 
an increased learning gain (Der & Herrmann, 1992; Herrmann, Bauer, & 
Der, 1994). This effect is also beneficial in technical systems. In remote- 
sensing image analysis, for instance, seldomly found ground cover classes 
should be detected, whereas usual (frequent) classes with broad variance 
should be suppressed (Merenyi & Jain, 2004; Villmann, Merenyi & Ham- 
mer, 2003). Another technical environment for magnification control is 
robotics for accurate description of dangerous navigation states (Villmann 
& Heinze, 2000). 

In this article we concentrate on a general framework for magnification 
control in SOM and NG. In this context, we briefly review the most im- 
portant approaches. One approach for SOM is generalized, and afterward, 
it is transferred to NG. For this purpose, we first give the basic notations, 
followed in section |3] by a more detailed description of magnification and 
early approaches related to the topic of magnification control, including 
a unified approach for controlling strategies. The magnification control 
approaches of SOM are described according to the unified framework in 
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section |4l whereby one of them is significantly extended. The same pro- 
cedure is applied to NG in section Again, one of the control approaches 
presented in this section is new. A short discussion concludes the article. 

2 Basic Concepts and Notations in SOM and 
NG 

In general, neural maps project data vectors v from a (possibly high- 
dimensional) data manifold V CM d onto a set A of neurons i, which is 
formally written as ^t>-*a ■ T) ^ A. Each neuron i is associated with a 
pointer w ; eW 1 , all of which establish the set W = {wi} ieA - The map- 
ping description is a winner-take-all rule, that is, a stimulus vector v£P 
is mapped onto that neuron s £ A with the pointer w s being closest to the 
actual presented stimulus vector v, 

^v^A : vhs (v) = argmin ||v — Wj|| . (2.1) 

ieA 

The neuron s is called winner neuron. The set Ri = 
{v G V\^x>-^A (v) = i} is called the (masked) receptive field of the 
neuron i. The weight vectors are adapted during the learning process such 
that the data distribution is represented. 

For further investigations, we describe SOM and NG as our focused 
neural maps in more detail. During the adaptation process a sequence of 
data points v G Pis presented to the map with respect to the data distribu- 
tion P (V). Then the most proximate neuron s according to equation (12.11 ) 
is determined, and the pointer w s , as well as all pointers Wj of neurons in 
the neighborhood of s, are shifted towards v, according to 

Aw i = e/i(i,v,W)(v-w i ). (2.2) 

The property of "being in the neighborhood of s" is represented by a neigh- 
borhood function h (i, v, W). The neighborhood function is defined as 

h\ (z, v, W) = exp I J (2.3) 

for the NG, where ki (v, W) yields the number of pointers w\, for which 
the relation ||v — Wj|| < ||v — Wj|| is valid (Martinetz et al., 1993); espe- 
cially, we have h x (s, v, W) = 1.0. In case of SOM the set A of neurons 
has a topological structure usually chosen as a hypercube or hexagonal lat- 
tice. Each neuron i has a fixed position r (i). The neighborhood function 
has the form 

Mi,v,W)=exp( J |rW 7j* (v))l1 ' ). (2.4) 
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In contrast to the NG, the neighborhood function of SOM is evaluated in 
the output space A according to its topological structure. This difference 
causes the significantly different properties of both algorithms. For the 
SOM there does not exist any energy function such that the adaptation 
rule follows the gradient descent (Erwin, Obermayer, & Schulten, 1992). 
Moreover, the convergence proofs are only valid for the one-dimensional 
setting (Cottrell, Fort & Pages, 1998, Ritter et al., 1992). The introduction 
of an energy function leads to different dynamics as in the EN (Durbin 
& Willshaw, 1987) or new winner determination rule (Heskes 1999). The 
advantage of the SOM is the ordered topological structure of neurons in A. 
In contrast, in the original NG, such an order is not given. One can extend 
the NG to the topology representing network (TRN) such that topological 
relations between neurons are installed during learning, although generally 
they do not achieve the simple structure as in SOM lattices (Martinetz & 
Schulten, 1994). Finally, the important advantage of the NG is that the 
adaptation dynamic of the weight vectors follows a potential minimizing 
dynamics (Martinetz et al., 1993). 

3 Magnification and Magnification Control in 
Vector Quantization 

3.1 Magnification in Vector Quantization. 

Usually vector quantization aims to minimize the reconstruction error 
RE = J R || v — Wj|| 2 P (v) dv. However, other quality criteria are 
also known, for instance, topographic quality (Bauer & Pawelzik, 1992; 
Bauer et al., 1999). More generally, one can consider the generalized dis- 
tortion error, 



This error is closely related to other properties of the (neural) vector quan- 
tizer. One important property is the achieved weight vector density p (w) 
after learning in relation to the data density P (V). Generally, for vector 
quantizers one finds the relation 



after the converged learning process (Zador 1982). The exponent a is 
called magnification exponent or magnification factor. The magnification 
is coupled with the generalized distortion error (13.11 ) by 
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Table 1 : Magnification of Different Neural Maps and Vector Quantization Ap- 
proaches. 



Model 


Magnification 


Reference 


Elastic net 


1 _|_ K 1 


Claussen and Schuster (2002) 


SOM 


1+12M 2 (<j) 
3+18Af 2 (o-) 


Dersch and Tavan (1995) 


Linsker network 


1 


Linsker (1989) 


LBG 


d 
d+2 


Zador (1982) 


FSCL 


3/3+1 
3/3+3 


Galanopoulos and Ahalt (1996) 


NG 


d 

d+2 


Martinetz et al. (1993) 



Note: For SOM, M 2 (er) denotes the 2nd normalized moment of the 



neighborhood function depending on the neighborhood range a. 

where d is the intrinsic or Hausdorff dimension 1 of the data. Begin- 
ning with the pioneering work of Amari (1980), which investigated a 
resolution-density relation of map formation in a neural field model and 
extended the approach of Willshaw and von der Malsburg (1976), for sev- 
eral neural map and vector quantizer approaches the magnification relation 
has been considered, including the investigation of the relation between 
data and model density. 

Generally, different magnification factors are obtained for different 
vector quantization approaches. An overview of several important mod- 
els with known magnification factors is given in Table[TJ 

For the usual SOMs, mapping a one-dimensional input space onto a 
chain of neurons, 

2 

a som = - (3.4) 

holds in the limit 1 < o < N (Ritter & Schulten, 1986). For small val- 
ues of neighborhood range a, the neighborhood ceases to be of influence, 
and the magnification rate approaches the value a = | (Dersch & Ta- 
van, 1995). The influence of different types of neighborhood function was 
studied in detail for SOMs in Dersch and Tavan (1995), which extends 
the early works of Luttrell (1991) and Ritter (1991). The magnification 
depends on the second normalized moment M 2 of the neighborhood func- 
tion, which itself is determined by the neighborhood range a. Van Hulle 

'Several approaches are known to estimate the Hausdorff dimension of data, often 
called intrinsic dimension. One of the best known methods is the Grassberger-Procaccia- 
analysis (GP) (Grassberger & Procaccia, 1983; Takens, 1985). For GP, there is a large 
number of investigations of statistical properties (e.g., Camastra and Vinciarelli, 2001; 
Eckmann and Ruelle, 1992; Liebert, 1991; Theiler, 1990). For a neural network approach 
of intrinsic dimension estimation (based on NG), also in comparison to GP, we refer to 
Bruske and Sommer (1998), Camastra & Vinciarelli (2001), Villmann, Hermann and 
Geyer (2000), Villmann (2002), and Villmann et al. (2003). 
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(2000) extensively discussed the influence of kernel approaches in SOMs. 
Results for magnification of discrete SOMs can be found in Ritter (1989) 
and Kohonen (1999). These latter problems and approaches will not be 
further addressed here. 

According to equations ( 13.31) and d3.ll ). the SOM minimizes the some- 
what exotic Ei distortion error, whereas the NG minimizes the usual E 2 - 

2 

error. 

Further, we can observe interesting relations to information-theoretic 
properties of the mapping: The information transfer realized by the map- 
ping ^d^a, m general, is not independent of the magnification of the map 
(Zador, 1982). It has been derived that for an optimal information trans- 
fer realizing vector quantizer (or a neural map in our context), the relation 
a = 1 holds (Brause, 1992). A vector quantizer designed to achieve an op- 
timal information transfer is the Linsker network (Linsker, 1989; see Table 
[T]), or the optimal coding network approach proposed by Brause (1994). 

3.2 Magnification Control in Vector Quantization: A General 
Framework. 

As pointed out in section [H different application tasks may require dif- 
ferent magnification properties of the vector quantizer, that is, the mag- 
nification should be controlled. Straightforwardly, magnification control 
means changing the value of the magnification factor a for a given vector 
quantizer by manipulation of the basic approach. 

Consequently, the question is, How one can impact the magnification 
factor to achieve an a priori chosen magnification factor? We further ad- 
dress this topic in the following. First, we review results from the literature 
and put them into a general framework. 

The first approaches to influence the magnification of a vector quan- 
tizer are models of conscience learning, characterized by a modified win- 
ner determination. The algorithm by DeSieno (1988) and the frequency 
sensitive competitive learning (FSCL) (Ahalt, Krishnamurty, Chen, & 
Melton, 1990) belong to this algorithm class. Originally, these approaches 
were proposed for equalizing the winner probability of the neural units in 
SOM. However, as the neighborhood relation between neurons is not used 
in this approach, it is applicable to each vector quantizer based on winner- 
take-all learning. To achieve the announced goal, in the DeSieno model, a 
bias term B is inserted into the winner determination rule, equation (12. lb . 
such that 

*&t>->a : vhs (v) = argmin (|| v — Wj|| — B) (3.5) 

with the bias term B = 7 — Pi), and pi is the actual winning prob- 
ability of the neuron i. The algorithm converges such that the winning 
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probabilities of all neurons are equalized, which is related to a maximiza- 
tion of the entropy, and, hence, the resulted magnification is equal to the 
unity. However, an arbitrary magnification can not be achieved. Moreover, 
as pointed out in van Hulle (2000), the algorithm shows unstable behav- 
ior. FSCL modifies the selection criterion for the best-matching unit by 
a fairness term F, which is a function of the winning frequency uji of the 
neurons. Again, the winner determination is modified: 

^v^a :vhs(v) = argmin (F (c^) ||v - W;||) . (3.6) 

As mentioned above, originally it was defined to achieve an equiprobable 
quantization too. However, it was shown, this goal can not be achieved by 
the original version (Galanopoulos & Ahalt, 1996; van Hulle, 2000). Yet 
for one-dimensional data, any given 7-norm error criterion, equation d3.ll) . 
can be minimized by a specific choice of the fairness function: if F (a^) is 
taken as 

F(u l ) = (u l f (3.7) 

for the one-dimensional case a magnification ofscl = ||^| is achieved, 
being equivalent to 7 = 3-^-j- (Galanopoulos & Ahalt, 1996). The diffi- 
culties of transferring the one-dimensional result to higher dimensions are, 
however, as prohibitive as in SOM. 

We now study control possibilities to achieve arbitrary magnification, 
focusing on SOM and NG by modification of the learning rule. We em- 
phasize again that for SOM, the results hold only for the one-dimensional 
case, whereas for NG, the more general case of arbitrary dimensionality is 
valid. Thus, the following direction of modifications of the general learn- 
ing rule, equation (12.2b . 

Aw t = eh (i, v, W) (v - w<) , 

can serve as a general framework: 

1 . Localized learning: Introduction of a multiplicative factor by a local 
learning rate q 

2. Winner-relaxing learning: Introduction of winner relaxing by 
adding a winner-enhancing (relaxing) term R 

3. Concave-convex learning: Scaling of the learning shift by powers £ 
in the factor (v - w, : )^ 

These three directions serve as axes for a taxonomy in the following sec- 
tion. We focus on SOM and NG as popular neural vector quantizers. We 
explain, expand and develop the respective methodologies of magnifica- 
tion control for these models. The localized and the winner relaxing learn- 
ing for SOM and NG are briefly reviewed. In particular, localized learning 
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for SOM was published in Bauer, Der, and Herrmann, (1996) whereas win- 
ner relaxing learning for both SOM and NG and localized learning in NG 
were previously developed by the authors (Claussen, 2003, 2005; Claussen 
& Villmann, 2003a; Villmann, 2000). The concave-convex learning for 
SOM is extended here to a more general approach compared to its origins 
(Zheng & Greenleaf, 1996). The concave-convex learning for NG is new 
too. 

4 Controlling the Magnification in SOM 

Within the general framework outlined in section l3~2l we now consider the 
three learning rule modifications for SOM. 

4.1 Insertion of a Multiplicative Factor: Localized Learning. 

The first choice is to add a factor in the SOM learning rule. An established 
realization is the localized learning, the biological motivation of which 
is the perceptual magnet effect (Bauer et al., 1996). For this purpose, an 
adaptive local learning step size e s(v ) is introduced in equation (12.21) such 
that the new adaptation rule reads as 

Aw t = e s(v) /i ff (i,v,W) (v-w 4 ) (4.1) 

where s (v) is being the best-matching neuron with respect to equation 
d2.ll) . The local learning rates e 4 = e (w») depend on the stimulus density 
P at the position of their weight vectors Wj via 

(e l ) = e P(w t ) m , (4.2) 

where the brackets (...) denote the average in time. This approach leads 
to the new magnification law, 

a'localSOM = a SOM ■ (m + 1) , (4.3) 

where m appears to be an explicit control parameter (Bauer et al., 1996). 
Hence, an arbitrary predefined magnification can be achieved. 

In applications, one has to estimate the generally unknown data distri- 
bution P, which may lead to numerical instabilities of the control mecha- 
nism (van Hulle, 2000). 

4.2 Winner-Relaxing SOM and Magnification Control. 

Recently, a new approach for magnification control of the SOM by a gener- 
alization (Claussen, 2003, 2005) of the winner-relaxing modification (Ko- 
honen, 1991) was derived, giving a control scheme, which is independent 
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of the shape of the data distribution (Claussen 2005). We refer to this 
algorithm as WRSOM. 

In the original winner-relaxing SOM, an additional term occurs in 
learning for the winning neuron only, implementing a relaxing behavior. 
The relaxing force is a weighted sum of the difference between the weight 
vectors and the input according to their neighborhood relation. The relax- 
ing term was introduced to obtain a learning dynamic for SOM according 
to an average reconstruction error taking into account the effect of shifting 
Voronoi borders. 

The original learning rule is added by a winner relaxing term R (/i, k) 

as 

Aw, = eh a (i, v, W) (v - Wi ) + R (/x, «) , (4.4) 
with R (/!, k) being 

R (/x, k) = (/x + k) (v - Wi) 5 is (4.5) 
-nSi, ^2 h ° U v ' w ) ( v _ w i) ' 

3 

depending on weighting parameters /x and k. For /x = and k = \, the 
original winner relaxing SOM is obtained (Kohonen, 1991). Surprisingly, 
it has been shown that the magnification is independent of /i (Claussen, 
2003, 2005). Only the choice of k contributes to the magnification: 

, 2 
a WRSOM = ~ ~ g- (4-6) 

The stability range is \k\ < 1, which restricts the accessible magnification 
range to | < cn' WRSOM < 1. More detailed numerical simulations and 
stability analysis can be found in Claussen (2005). 

The advantage of winner relaxing learning is that no estimate of the 
generally unknown data distribution has to be made, as required in the 
local learning approach above. 

4.3 Concave- Convex Learning. 

The third structural possibility for control according to our framework is to 
apply concave or convex learning in the learning rule. This approach was 
introduced in Zheng and Greenleaf (1996). Here, we extend this approach 
to a more general variant. 

Originally, an exponent £ is introduced in the general learning rule 
such that equation ( 12.21) now reads as 

Aw ?; = eh a (z, v, W) (v - Wi ) 5 (4.7) 



9 



with 



w .)^( v _ w .).|| v _ w .||«-\ (4. 8) 



Thereby, two different possibilities are proposed: £ = - with k > 1, k gN 
and k is odd {convex learning), or one simply takes £ > 1, £ GN and £ is 
odd {concave learning). This gives the magnification 

/ 2 

^ concave /convex SO M £ _|_ 2 ^ 4.9) 

3 

= a soM-^-^ (4.10) 

which allows an explicit magnification control. Yet this approach allows 
only a rather rough control around £ = 1: the neighboring allowed values 
are £ = | and £ = 3 corresponding to magnifications a' concave/convexSOM = 

f and a 'concave/convexsoM = §» respectively. Therefore, greater flexibility 
would be of interest. 

For this purpose, we are seeking for a generalization of both concave 
and convex learning. As a more general choice we take £ to be real, that 
is, £ G E. If we do so, the same magnification equation (14.91) is obtained. 
The proof of the magnification law is given in appendix A. Obviously, the 
choices £ = - and £ = k > 1, k GN and k being odd as made in Zheng 
and Greenleaf (1996) are special cases of the now general approach. 

We considered the numerical behaviour of the magnification control 
of the WRSOM using a one-dimensinal chain of 50 neurons. The data 
distribution was chosen in agreement with Bauer et al. (1996) as P(x) = 
sin(7rx).The theoretical entropy maximum of the winning probabilities of 
the neurons pi is Yli=i Pi l°g(Pi) = ^og{N) giving the value 3.912 for 
N = 50. The results in dependence on £ for different neighborhood ranges 
a are depicted in Figure U 

According to the theoretical prediction, the output entropy is maxi- 
mized for small £, and for large £, an magnification exponent zero is 
reached corresponding to an equidistant codebook without adaptation to 
the input distribution. For a < 1, the turnover is shifted toward smaller 
values of £, and for £ <C 1, er <C 1, fluctuations increase. 

Further, as in the WRSOM, the advantage of concave-convex learning 
is that no estimate of the generally unknown data distribution has to be 
made as before in localized learning. 
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Figure 1 : Output entropy for concave and convex learning. An input density of 
P(x) = sin(7rx) was presented to an one-dimensional chain of N = 50 neu- 
rons after 10 6 learning steps of stochastic sequential updating, averaged over 10 5 
inputs, and learning rate e = 0.01, fixed. 



5 Magnification Control in Neural Gas 

In this section we transfer the ideas of magnification control in SOM to 
the NG, keeping in mind the advantage that the results then are valid for 
any dimension. 

5.1 Multiplicative Factor - Localized Learning. 

The idea of localized learning is now applied to NG (Herrmann & Vill- 
mann 1997). Hence, we have the localized learning rule 

Aw, = e s (y)h\ (i, v, W) (v - Wi) , (5.1) 

with s (v) again being the best-matching neuron with respect to equation 
(12.11) and e s ( v ) is the local learning chosen as in equation (14.21) . This ap- 
proach gives a similar result as for SOM, 

a 'localNG = a NG • (m + 1) , (5.2) 
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Figure 2: Local learning for NG: Plot of the entropy H for maps trained with 
different magnification control parameters m (d = 1 (o), d = 2 (+), d = 3 (□)). 
The arrows indicate the theoretical values of m (m = 2, m = 1, m = 2/3, resp.) 
which maximizes the entropy of the map. 

and, hence, allows a magnification control (Villmann, 2000). However, 
we have similar restrictions as for SOM: in actual applications one has to 
estimate the generally unknown data distribution P. 

The numerical study shows that the approach can also be used to 
increase the mutual information of a map generated by a NG (Vill- 
mann, 2000). As for WRSOM, we use a standard setup as in Villmann 
(2000) of 50 Neurons and 10 7 training steps with a probability density 
P(xi...x d ) = Yii sin(7raj), x G [0,1], and with parameters A = 1.5 
fixed and e decaying from 0.5 to 0.05. The entropy of the resulting map 
computed for an input dimension of 1, 2 and 3 is plotted in Figure 

5.2 Winner-Relaxing NG. 

The winner-relaxing NG (WRNG) was first studied in Claussen and Vill- 
mann (2003a). According to the WRSOM approach, one uses an additive 
winner relaxing term R (/z, n) to the original learning rule: 

Aw, = eh x (i, v, W) (v - Wi) + R (/i, «) , (5.3) 
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with R (/i, k) being as in equation (I4.5I) . The resulting WRNG- 
magnification for small neighborhood values A with A — ► but not van- 
ishing is given by Claussen and Villmann (2005): 

Thereby, the magnification exponent appears to be independent of an ad- 
ditional diagonal term (controlled by fi) for the winner the same as in WR- 
SOM; again fi = is the usual setting. If the same stability borders |«| = 1 
of the WRSOM also apply here, one can expect to increase the NG expo- 
nent by positive values of k, or to lower the NG exponent by factor 1/2 for 
k = -1. 

However, one has to be cautious when transferring the A — > result 
obtained above (which would require to increase the number of neurons as 
well) to a realistic situation where a decrease of A with time will be limited 
to a final finite value to avoid the stability problems found in Herrmann and 
Villmann (1997). For a finite A the maximal coefficient h x that contributes 
to the averaged learning shift is given by the prefactor of the second but 
one winner, which is given by e A (Claussen & Villmann, 2005). For the 
NG, however, the neighborhood is defined by the rank list. As the winner 
term of the NG is not present in the winner relaxing term (for fi = 0), all 
terms share the factor e~ A by h\(k) = e~ x h\{k — 1) which indicates that 
in the discretized algorithm k has to be rescaled by e +A to agree with the 
continuum theory. The numerical investigation indicates that this prefactor 
applies for finite A and number of neurons. The scaling of the position of 
the entropy maximum with input dimension is in agreement with theory, 
as well as the prediction of the opposite sign of k that has to be taken to 
increase mutual information. 

Numerical studies show that winner-relaxing learning can also be used 
to increase the mutual information of a NG vector quantization. The 
entropy shows a dimension-dependent maximum approximately at k — 
-^P2& x (see Figure^). In any case, within a broad range around the optimal 
k, the entropy is close to the maximum. 

The advantage of the method is to be independent on estimation of 
the unknown data distribution as the SOM equivalent WRSOM. Further, 
again as in the WRSOM, the magnification of WRNG is independent in the 
first order on the diagonal term, controlled by /i. Numerical simulations 
have shown that the contribution in higher orders is marginal (Claussen & 
Villmann, 2003b). More pronounced is the influence of the diagonal term 
on stability. According to the larger prefactor, no stable behavior has been 
found for > 1, therefore fj, — is the recommended setting (Claussen 
& Villmann, 2005). 
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Figure 3: Winner relaxing learning for NG: Plot of the entropy H curves for 
varying values of k for one-, two and three-dimensional data. The entropy has the 
maximum if the magnification equals the unit (Zador 1982). The arrows indicate 
the K-values for the respective data dimensions. 



5.3 Concave- Convex Learning. 

We now consider the third modification known from SOM, the concave- 
convex learning approach but in its new, developed general variant, 

Aw i = e /i A (2,v,W)(v-w i ) ? , (5.5) 

with £ G R and the definition (14.81) . It is proved in the appendix B that the 
resulting magnification is 

® concave /convexNG ~p _|_ j _|_ (5-6) 

depending on the intrinsic data dimensionality d. This dependency is in 
agreement with the usual magnification law of NG, which is also related 
to the data dimension. 

The respective numerical simulations with the parameter choice as be- 
fore are given in Figure |U In contrast to concave-convex SOM where 
a' = 1 can be achieved for large £, here a' is bounded by \ information 
optimal learning is not possible in cases of low-dimensional data. 
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Figure 4: Concave-convex learning for NG: Plot of the entropy H curves for 
varying values of £ for one-, two and three-dimensional data. The entropy can be 
enhanced by convex learning in each case (dashed line: d = 1 , with 10 8 learning 
steps). 



6 Discussion 

According to the given general framework, we studied three structurally 
different approaches for magnification control in SOM and NG. All meth- 
ods are capable to control the magnification with more or less accuracy. 
Yet, they differ in properties (e.g., stability range, density estimation). No 
approach yet shows a clear advantage. The choice of the optimal algorithm 
may depend on the particular problem and implementation constraints. In 
particular, several problems occur in actual applications. First, in the SOM 
case, all result are only valid for the one-dimensional case, because all in- 
vestigations are based on the usual convergence dynamic. However, the 
SOM dynamics is analytically treatable only in the one-dimensional set- 
ting and higher-dimensional cases that factorize. Moving away from these 
special cases causes a systematic shift in magnification control, as numer- 
ically shown in Jain and Merenyi (2004). In actual applications, a quan- 
titative comparison with theory is quite limited due to several influences 
which are not easily tractable. First, the data density has to be estimated, 
which is generally difficult (Merenyi & Jain, 2004); second, the intrinsic 
dimension has to be determined; and third, the measurement of the mag- 
nification from the density of weight vectors is rather coarse, especially in 
higher dimensions. 
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Table 2: Comparison of Magnification Control for the Different Control Ap- 
proaches for SOM and NG (d = 1 for SOM). 



SOM 



NG 



Local 
learning 



(m + 1) a sou 
(Bauer etal., 1996) 



(m + 1) a NG 
(Villmann 2000) 



Winner-relaxing 
learning 



(Claussen, 2003, 2005) 



(Claussen & Villmann, 2005) 



Concave-convex 
learning 



-pp2.0L S OM 

(in section 4.3; 
Zheng & Greenleaf, 1996) 




in section 5.3 



Only some special cases can be handled adequately. In particular, max- 
imizing mutual information can be controlled easily by observation of the 
entropy of winning probabilities of neurons or consideration of inverted 
magnification in case of available auxiliary class information, that is, la- 
beled data (Merenyi & Jain, 2004). Thus, actual applications have to be 
done carefully using some heuristics. Interesting, successful applications 
of magnification control (by local learning) in satellite remote sensing im- 
age analysis can be found in Merenyi and Jain (2004), (Villmann, 1999; 
Villmann et al., 2003). 

Summarizing the above approaches of magnification control, we ob- 
tain the good news that the possibilities for magnification control known 
from SOM can be successfully transferred to the NG learning in all three 
cases. The achieved theoretical magnifications are collected in Table |2j 

The interesting point is that the local learning approach, as well as 
concave-convex learning, yields structurally similar modification factors 
for the new magnification. However, a magnification of 1 is not reachable 
by concave-convex learning in case of NG. In case of the winner relaxing 
approach, we have a remarkable difference: in contrast to the WRSOM, 
where the relaxing term has to be inverted (k < 0) to increase the magni- 
fication exponent, for the NG, positive values of n are required to increase 
the magnification factor. 

Appendix A: Magnification Law of the Generalized Concave- Convex 
Learning for the Self-Organizing Map 

In this appendix we prove the magnification law of the generalized 
concave-convex learning for SOM: the exponent in equation ( 14.71) is re- 
quired to be £ G R and keeping further in mind the definition (I4.8I) . Since 
the convergence proofs of SOM are only valid for the one-dimensional 
setting, we switch from w to w and from v to v . 
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In the continuum approach, we can replace the index of the neuron by 
its position or location r (Ritter et al., 1992). Further, the neighborhood 
function h a depends only on the difference of the location r to r 8 / v ) as the 
location of the winning neuron. Then we have in the equilibrium for the 
learning rule equation ( 14.71 ). 

h a (r - r s („)) (v — w (r)) 5 P (v) dv = 0. (A.l) 

We perform the usual approach of expanding the integrand in a Taylor 
series in powers of q — s (v) — r and evaluating at r (Ritter & Schulten, 
1986; Hertz, Krogh, & Palmer, 1991; Zheng & Greenleaf, 1996). This 
gives 

v = w (r + q) , (A.2) 
h a (s (v) — r) becomes h a (q) = h a (— q), and 

P(v) = P (w (r + q)) « P (w) + qP' (w) w' (r) . (A.3) 
Further, dv = dw (r + q) = w' (r + q) dq can be rewritten as 

w' (r + q)dq « (u/ + <V) dq, (A.4) 
and for u — w (r) = it; (r + ?) — w (r) we get 

w (r + ?) — if (r) ~ fit/ + i^ w " = *> f w ' ' (A-5) 

Because of (i> — w (r)) € in equation dA. 11) . we consider (w' + ^-u;") 5 : 

(u/ + («/)<(l + (A.6) 

Further, because of the definition (14.81) . the power q^ has to be interpreted 

as 

<f = ? • k| f_1 , (A.7) 
which is an odd function in ?. 



Collecting now (IA.2ft - dA.7t we get in (TAT 



^ (q)-q. Iqt 1 ■ («/)* ■ (l + ^w" K)" 1 ? ) (A.8) 
x (P (w) + ?P' (tu) tw' (r)) (w' + qw") dq. 
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Since q • |^ 1 is odd, the term of lowest order in q vanishes according 
to the rotational symmetry of h a (q). Further, in our approximation, we 
ignore terms behind q 2 . Hence, the above equation can be simplified as 



iir'r { P' (w) {w'f + ^P (w) w'') j K (q) ■ q 2 ■ {q^ 1 dq. 



(A.9) 



From there we get 



P 



dr 



dw 



2 



P~i (A. 10) 



and, hence, 



^■concave I convexSOM 2 -j- £ ' (A. 11) 



which completes the proof. 



Appendix B: Magnification Law of the Generalized Concave- Convex 
Learning for Neural Gas 

For the derivation of the magnification for the generalized concave-convex 
learning in case of magnification-controlled NG, first we have the usual 
continuum assumption (Ritter et al., 1992). The further treatment is in 
complete analogy to the derivation of the magnification in the usual NG 
(Martinetz et al., 1993). Let r be the difference vector 

r = v-Wj, (B.l) 

The winning rank hi (v, W) in the neighborhood function h\ (i, v, W) in 
equation (12.31) depends only on r, therefore, we introduce the new variable 



x(r) = r.ki(r)*, (B.2) 

which can be assumed as monotonously increasing with ||r||. We define 
the d x c/-Jacobian 

J(x)=d*g). (B.3) 
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Starting from the new learning rule, 

Aw l = e / iA (^v,W)(v-w i ) ? , (B.4) 
again consider the averaged change, 



(Aw,) = / P (v) h\ (i, v, W) (v - wtf rfv . (B.5) 

If h\ (i, v, W) in equation ( 12.31 ) rapidly decreases to zero with increasing 
r, we can replace the quantities r (x), J (x) by the first terms of their 
respective Taylor expansions around the point x = 0, neglecting higher 

derivatives. We obtain 

x (r) = r (r dP ( Wj ))* (l + + O (r 2 )) , (B.6) 

which corresponds to 



r x 



_L_ 

1 - (j d p (Wi)) d 



X-l9 r p(Wj) 

d-p(wi) 



x(r d p (Wi)) d 



(B.7) 



with 



7T2 

r d = — — (B.8) 

r(| + i) 

as the volume of a rf-dimensional unit sphere (Martinetz et al., 1993). We 
define ip = r d p (wj). Further, we expand J (x) and obtain 



J(x) 



dJ 

J(0)+x k — + 



(B.9) 



K'-^H)-*-^ ^ (Bio 



and, hence, 



dx 



x=0 



P 



(B.ll) 
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After collecting all replacements, (IB.5I) becomes 



(Awj) = e- (p v I dxh\ (x) • • 
Jv 

■ + -x-d r P + ...)• (B.12) 





1 




1 




1 — cp d 


■ X ■ 



^).x.^ + ...y (B.13) 

p J 

" rP + ...V, (B.14) 



d ■ p 

with new integration variable x. We use the approximation 



i 



d r p \ ? i d r p 

l_^-z. x --^- + ... wl-^-J.x^ + ... (B.15) 
d ■ p J d ■ p 

and get 

(Awj) = e • <p~d I rf x /ia (x) ■ x^ 
Jv 

■ (p + p- L * -x* -d r P + ...) (B.16) 

.(I_( 1 + ^- (1+1 ). x ,^ + ...) (B , 7) 

Yl-f^-x«.|^ + ...V (B.18) 
In the equilibrium (Aw*) = 0, we have 



= J dxh x (x) • x ? • (p + • x f • d r P + . . .) 

+ + (B.19) 

.(l-ftT*.x«.^ + ...). (B.20) 

Because of the rotational symmetry of /za, we can neglect odd power terms 
in x. Remaining terms are of even power order. Again, according to equa- 
tion ( 14.81) . we take x^ = x- |x|^ _1 , and, hence, x^ itself acts as an odd term. 
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Therefore, only terms containing x ?+fc with odd k contribute. Finally, con- 
sidering the non-vanishing terms and neglecting higher order terms, we 
find the relation 



which is the desired result. 
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